Goto

Collaborating Authors

 attack algorithm



Adversarial Attacks on Online Learning to Rank with Click Feedback

Neural Information Processing Systems

Online learning to rank (OLTR) is a sequential decision-making problem where a learning agent selects an ordered list of items and receives feedback through user clicks. Although potential attacks against OLTR algorithms may cause serious losses in real-world applications, there is limited knowledge about adversarial attacks on OLTR. This paper studies attack strategies against multiple variants of OLTR. Our first result provides an attack strategy against the UCB algorithm on classical stochastic bandits with binary feedback, which solves the key issues caused by bounded and discrete feedback that previous works cannot handle.


Automatic Attack Discovery for Few-Shot Class-Incremental Learning via Large Language Models

Kang, Haidong, Wu, Wei, Wang, Hanling

arXiv.org Artificial Intelligence

Few-shot class incremental learning (FSCIL) is a more realistic and challenging paradigm in continual learning to incrementally learn unseen classes and overcome catastrophic forgetting on base classes with only a few training examples. Previous efforts have primarily centered around studying more effective FSCIL approaches. By contrast, less attention was devoted to thinking the security issues in contributing to FSCIL. This paper aims to provide a holistic study of the impact of attacks on FSCIL. We first derive insights by systematically exploring how human expert-designed attack methods (i.e., PGD, FGSM) affect FSCIL. We find that those methods either fail to attack base classes, or suffer from huge labor costs due to relying on huge expert knowledge. This highlights the need to craft a specialized attack method for FSCIL. Grounded in these insights, in this paper, we propose a simple yet effective ACraft method to automatically steer and discover optimal attack methods targeted at FSCIL by leveraging Large Language Models (LLMs) without human experts. Moreover, to improve the reasoning between LLMs and FSCIL, we introduce a novel Proximal Policy Optimization (PPO) based reinforcement learning to optimize learning, making LLMs generate better attack methods in the next generation by establishing positive feedback. Experiments on mainstream benchmarks show that our ACraft significantly degrades the performance of state-of-the-art FSCIL methods and dramatically beyond human expert-designed attack methods while maintaining the lowest costs of attack.



Deep learning models are vulnerable, but adversarial examples are even more vulnerable

Li, Jun, Xu, Yanwei, Li, Keran, Zhang, Xiaoli

arXiv.org Artificial Intelligence

Understanding intrinsic differences between adversarial examples and clean samples is key to enhancing DNN robustness and detection against adversarial attacks. This study first empirically finds that image-based adversarial examples are notably sensitive to occlusion. Controlled experiments on CIFAR-10 used nine canonical attacks (e.g., FGSM, PGD) to generate adversarial examples, paired with original samples for evaluation. We introduce Sliding Mask Confidence Entropy (SMCE) to quantify model confidence fluctuation under occlusion. Using 1800+ test images, SMCE calculations supported by Mask Entropy Field Maps and statistical distributions show adversarial examples have significantly higher confidence volatility under occlusion than originals. Based on this, we propose Sliding Window Mask-based Adversarial Example Detection (SWM-AED), which avoids catastrophic overfitting of conventional adversarial training. Evaluations across classifiers and attacks on CIFAR-10 demonstrate robust performance, with accuracy over 62% in most cases and up to 96.5%.


AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research

Beyer, Tim, Dornbusch, Jonas, Steimle, Jakob, Ladenburger, Moritz, Schwinn, Leo, Günnemann, Stephan

arXiv.org Artificial Intelligence

The rapid expansion of research on Large Language Model (LLM) safety and robustness has produced a fragmented and oftentimes buggy ecosystem of implementations, datasets, and evaluation methods. This fragmentation makes reproducibility and comparability across studies challenging, hindering meaningful progress. To address these issues, we introduce AdversariaLLM, a toolbox for conducting LLM jailbreak robustness research. Its design centers on reproducibility, correctness, and extensibility. The framework implements twelve adversarial attack algorithms, integrates seven benchmark datasets spanning harmfulness, over-refusal, and utility evaluation, and provides access to a wide range of open-weight LLMs via Hugging Face. The implementation includes advanced features for comparability and reproducibility such as compute-resource tracking, deterministic results, and distributional evaluation techniques. \name also integrates judging through the companion package JudgeZoo, which can also be used independently. Together, these components aim to establish a robust foundation for transparent, comparable, and reproducible research in LLM safety.


Adversarial Attacks on Online Learning to Rank with Click Feedback

Neural Information Processing Systems

Although potential attacks against OL TR algorithms may cause serious losses in real-world applications, there is limited knowledge about adversarial attacks on OL TR. This paper studies attack strategies against multiple variants of OL TR.



e1c13a13fc6b87616b787b986f98a111-Supplemental.pdf

Neural Information Processing Systems

This section gives the worst-case time analysis for Algorithm 1. This gives the bound shown in Eq. 3. B.1 Loss function space L Recall that the loss function search space is defined as: (Loss Function Search Space) L::= targeted Loss, n with Z | untargeted Loss with Z | targeted Loss, n - untargeted Loss with Z Z::= logits | probs To refer to different settings, we use the following notation: U: for the untargeted loss, T: for the targeted loss, D: for the targeted untargeted loss L: for using logits, and P: for using probs Effectively, the search space includes all the possible combinations expect that the cross-entropy loss supports only probability. B.2 Attack Algorithm & Parameters Space S Recall the attack space defined as: S::= S; S | randomize S | EOT S, n | repeat S, n | try S for n | Attack with params with loss L randomize The type of every parameter is either integer or float. Generic parameters and the supported loss for each attack algorithm are defined in Table 4. B.3 Search space conditioned on network property Following Stutz et al. (2020), we use the robust test error (Rerr) metric We define robust accuracy as 1 Rerr. Note however that Rerr defined in Eq. 5 has intractable maximization problem in the denominator, Note that we use a zero knowledge detector model, so none of the attacks in the search space are aware of the detector.


Investigating Vulnerabilities and Defenses Against Audio-Visual Attacks: A Comprehensive Survey Emphasizing Multimodal Models

Wen, Jinming, Wu, Xinyi, Zhao, Shuai, Jia, Yanhao, Li, Yuwen

arXiv.org Artificial Intelligence

Multimodal large language models (MLLMs), which bridge the gap between audio-visual and natural language processing, achieve state-of-the-art performance on several audio-visual tasks. Despite the superior performance of MLLMs, the scarcity of high-quality audio-visual training data and computational resources necessitates the utilization of third-party data and open-source MLLMs, a trend that is increasingly observed in contemporary research. This prosperity masks significant security risks. Empirical studies demonstrate that the latest MLLMs can be manipulated to produce malicious or harmful content. This manipulation is facilitated exclusively through instructions or inputs, including adversarial perturbations and malevolent queries, effectively bypassing the internal security mechanisms embedded within the models. To gain a deeper comprehension of the inherent security vulnerabilities associated with audio-visual-based multimodal models, a series of surveys investigates various types of attacks, including adversarial and backdoor attacks. While existing surveys on audio-visual attacks provide a comprehensive overview, they are limited to specific types of attacks, which lack a unified review of various types of attacks. To address this issue and gain insights into the latest trends in the field, this paper presents a comprehensive and systematic review of audio-visual attacks, which include adversarial attacks, backdoor attacks, and jailbreak attacks. Furthermore, this paper also reviews various types of attacks in the latest audio-visual-based MLLMs, a dimension notably absent in existing surveys. Drawing upon comprehensive insights from a substantial review, this paper delineates both challenges and emergent trends for future research on audio-visual attacks and defense.